Purpose

QC metrics has been used to justify the pipelien quality. Here we propose an analysis to evaluate the pipeline updates/changes by using Picard QC metrics.

Input data

The key input data for this test are two Picardtools QC metrics from two pipelines

First load functions we will use in this test

Here is the list of input parameters

Parse QC Metrics Files

The QC metrics files are the collection of Picardtools QC outputs, which are formated as N*M matrix The rows are QC metrics and columns are cells/samples.

Here is the list of metrics we will analysis with

##  [1] "detected_ratio"               "MT_ratio"                    
##  [3] "PCT_PF_READS_ALIGNED"         "PCT_PF_READS_IMPROPER_PAIRS" 
##  [5] "PCT_READS_ALIGNED_IN_PAIRS"   "PCT_CODING_BASES"            
##  [7] "PCT_INTERGENIC_BASES"         "PCT_INTRONIC_BASES"          
##  [9] "PCT_UTR_BASES"                "PCT_USABLE_BASES"            
## [11] "PCT_MRNA_BASES"               "PCT_RIBOSOMAL_BASES"         
## [13] "PERCENT_DUPLICATION"          "MEDIAN_5PRIME_TO_3PRIME_BIAS"
## [15] "MEDIAN_3PRIME_BIAS"           "MEDIAN_5PRIME_BIAS"          
## [17] "MEDIAN_CV_COVERAGE"           "PF_MISMATCH_RATE"            
## [19] "MEDIAN_INSERT_SIZE"

First we match column names(cell IDs) between two pipelines metrics

Then select the subset of QC metrics to run analysis.

Finally, we have two QC metrics to run analysis on.

Analysis

To evaluate the pipeline changes/updates, we will carry out statistical tests on each QC metric and these tests are described and visualized as following:

Let’s run the analysis mentioned above on the set of QC metrics.

## Warning: package 'bindrcpp' was built under R version 3.4.4

Outputs